Fix Intel CPUID leaf 4 cache topology for SMT#1002
Fix Intel CPUID leaf 4 cache topology for SMT#1002glitzflitz wants to merge 3 commits intooxidecomputer:masterfrom
Conversation
|
oh nice, thanks! totally an oversight when I was in there. how visible is this under so, if this is legible in the guest, could you adjust that test to check (.. I also see that in retrospect that test assumes SMT siblings are adjacent in APIC ID, which is definitely wrong in general.) |
|
This is what I get before the patch which indeed shows each vCPU as its own private L1 and L2 cache. After the patch I get Let me add a test for this |
|
Also I just noticed the propolis/phd-tests/tests/src/cpuid.rs Line 326 in dacb53d shouldn't sibling_idx be idx/2 instead of idx/4?The thread_siblings documentation in linux is vague but I think it is a hex bitmask where bit N is set if CPU N is a sibling. With SMT, CPUs 0-1 set bits 0-1 which gives '3', CPUs 2-3 set bits 2-3 which gives 'c' and so on. Since each hex digit covers 4 CPUs and we are iterating over pairs of siblings, we go through 2 pairs before moving to the next hex digit, so idx/2 makes sense instead of idx/4? The current code would only advance sibling_idx every 8 CPUs.I just ran the test it fails for me locally. |
When SMT is enabled, L1/L2 caches should report being shared by 2 logical processors (the SMT siblings). Previously EAX[25:14] was always being set to 0, indicating no sharing which contradicts the SMT topology reported in leaf 0xB. As per [1] EAX[25:14] indicates maximum number of addressable IDs for logical processors sharing this cache. This mismatch causes linux guest to print "BUG: arch topology borken / the SMT domain not a subset of the CLS domain" during boot. Linux derives L2 cache sharing groups from leaf 4 and expects SMT siblings to share L2 but it was being informed that each vCPU has private L1/L2. This brings the SMT handling logic in CPUID inline with what being done for AMD in fix_amd_cache_topo() which sets the sharing count to 2 when has_smt is true. This fixes oxidecomputer#1001. [1]: Table 1-15. Reference for CPUID Leaf 04H https://cdrdv2-public.intel.com/775917/intel-64-architecture-processor-topology-enumeration.pdf Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
The existing test assertion would fail on hosts with SMT enabled due to incorrect index calculations. Also add has_smt() helper to skip thread_siblings checks on non-SMT hosts and remove the unused itertools import. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Verify that Linux guest observes correct cache sharing topology from /sys/devices/system/cpu/cpu0/cache/. With SMT enabled, L1 and L2 caches should report sharing by SMT siblings while L3 should be shared across all vCPUs. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
|
I fixed the |
|
Also noticed this late
Do you mean index3 😅/L3 which should be shared across all cores? Index 0 and 1 is split among L1i and L1d. I added test for L1 and L2 to be shared among SMT siblings and L3 to be among all CPUs in Add cache topology verification to guest_cpu_topo_test |
When SMT is enabled, L1/L2 caches should report being shared by 2 logical processors (the SMT siblings). Previously EAX[25:14] was always being set to 0, indicating no sharing which contradicts the SMT topology reported in leaf
0xB. As per [1] EAX[25:14] indicates maximum number of addressable IDs for logical processors sharing this cache.This mismatch causes linux guest to print "BUG: arch topology borken / the SMT domain not a subset of the CLS domain" during boot. Linux derives L2 cache sharing groups from leaf 4 and expects SMT siblings to share L2 but it was being informed that each vCPU has private L1/L2.
This brings the SMT handling logic in CPUID inline with whats being done for AMD in
fix_amd_cache_topo()which sets the sharing count to 2 when has_smt is true. This fixes #1001.[1]: Table 1-15. Reference for CPUID Leaf 04H
https://cdrdv2-public.intel.com/775917/intel-64-architecture-processor-topology-enumeration.pdf